NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Preference learning made easy: Everything should be understood through win rate

Zhang, Lily H; Ranganath, Rajesh (July 2025, International Conference on Machine Learning (ICML 2025))

Preference learning, or the task of aligning generative models to preference comparison data, has yet to reach the conceptual maturity of classification, density estimation, etc. To close this gap, this work presents a framework to understand preference learning starting from the sampling distribution of pairwise preference data. First, we prove that the only evaluation of a generative model that respects both preferences and prevalences in the data distribution is a form of win rate, justifying win rate as the focal point to understand preference learning. We then analyze preference learning methods as win rate optimization (WRO) or non-WRO. We present novel instances of WRO beyond existing examples (RLHF, NLHF) and identify two key theoretical benefits of all such methods. We prove that common non-WRO methods like DPO and SFT on preferred samples lack these properties and suggest ways to mitigate such theoretical limitations. We also show that WRO underperforms in practice due optimization difficulties and that optimization success predicts performance better than choices which affect the objective's solution. Our analysis highlights best practices for existing methods and provides recommendations for future research, guided by the principle that one should either align non-WRO methods more closely with WRO or improve the optimization of WRO objectives.
more » « less
Free, publicly-accessible full text available July 13, 2026
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to Do

Wald, Yoav; Goldstein, Mark; Efroni, Yonathan; Amsterdam, Wouter; Ranganath, Rajesh (April 2025, International Conference on Learning Representations)

Free, publicly-accessible full text available April 24, 2026
Explanations that reveal all through the definition of encoding

Puli, Aahlad; Nguyen, Nhi; Ranganath, Rajesh (December 2024, Neural Information Processing Systems)

Full Text Available
Explanations that reveal all through the definition of encoding

Puli, Aahlad; Nguyen, Nhi; Ranganath, Rajesh (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024).)

Feature attributions attempt to highlight what inputs drive predictive power. Good attributions or explanations are thus those that produce inputs that retain this predictive power; accordingly, evaluations of explanations score their quality of prediction. However, evaluations produce scores better than what appears possible from the values in the explanation for a class of explanations, called encoding explanations. Probing for encoding remains a challenge because there is no general characterization of what gives the extra predictive power. We develop a definition of encoding that identifies this extra predictive power via conditional dependence and show that the definition fits existing examples of encoding. This definition implies, in contrast to encoding explanations, that non-encoding explanations contain all the informative inputs used to produce the explanation, giving them a "what you see is what you get" property, which makes them transparent and simple to use. Next, we prove that existing scores (ROAR, FRESH, EVAL-X) do not rank non-encoding explanations above encoding ones, and develop STRIPE-X which ranks them correctly. After empirically demonstrating the theoretical insights, we use STRIPE-X to show that despite prompting an LLM to produce non-encoding explanations for a sentiment analysis task, the LLM-generated explanations encode.
more » « less
Full Text Available
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities

Saporta, Adriel; Puli, Aahlad; Goldstein, Mark; Ranganath, Rajesh (December 2024, Neural Information Processing Systems)

Full Text Available
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities

Saporta, Adriel; Puli, Aahlad; Goldstein, Mark; Ranganath, Rajesh (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

Contrastive learning methods, such as CLIP, leverage naturally paired data-for example, images and their corresponding text captions-to learn general representations that transfer efficiently to downstream tasks. While such approaches are generally applied to two modalities, domains such as robotics, healthcare, and video need to support many types of data at once. We show that the pairwise application of CLIP fails to capture joint information between modalities, thereby limiting the quality of the learned representations. To address this issue, we present Symile, a simple contrastive learning approach that captures higher-order information between any number of modalities. Symile provides a flexible, architecture-agnostic objective for learning modality-specific representations. To develop Symile's objective, we derive a lower bound on total correlation, and show that Symile representations for any set of modalities form a sufficient statistic for predicting the remaining modalities. Symile outperforms pairwise CLIP, even with modalities missing in the data, on cross-modal classification and retrieval across several experiments including on an original multilingual dataset of 33M image, text and audio samples and a clinical dataset of chest X-rays, electrocardiograms, and laboratory measurements. All datasets and code used in this work are publicly available.
more » « less
Full Text Available
What’s the score? Automated Denoising Score Matching for Nonlinear Diffusions

Singhal, Raghav; Goldstein, Mark; Ranganath, Rajesh (July 2024, International Conference on Machine Learning Research)

Full Text Available
Preference Learning Algorithms Do Not Learn Preference Rankings

Chen, Angelica; Malladi, Sadhika; Zhang, Lily H; Chen, Xinyi; Zhang, Qiuyi; Ranganath, Rajesh; Cho, Kyunghyun (October 2024, 2024 Conference on Neural Information Processing Systems)

Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited. In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via ranking accuracy. Surprisingly, we find that most state-of-the-art preference-tuned models achieve a ranking accuracy of less than 60% on common preference datasets. We furthermore derive the idealized ranking accuracy that a preference-tuned LLM would achieve if it optimized the DPO or RLHF objective perfectly. We demonstrate that existing models exhibit a significant alignment gap -- i.e., a gap between the observed and idealized ranking accuracies. We attribute this discrepancy to the DPO objective, which is empirically and theoretically ill-suited to fix even mild ranking errors in the reference model, and derive a simple and efficient formula for quantifying the difficulty of learning a given preference datapoint. Finally, we demonstrate that ranking accuracy strongly correlates with the empirically popular win rate metric when the model is close to the reference model used in the objective, shedding further light on the differences between on-policy (e.g., RLHF) and off-policy (e.g., DPO) preference learning algorithms.
more » « less
Full Text Available
Towards Minimal Targeted Updates of Language Models with Targeted Negative Training

Zhang, Lily; Ranganath, Rajesh; Tafvizi, Arya (June 2024, Transactions on machine learning research)

Full Text Available
Stochastic Interpolants with Data-Dependent Couplings

Albergo, Michael S; Goldstein, Mark; Boffi, Nicholas M; Ranganath, Rajesh; Vanden-Eijnden, Eric (July 2024, International Conference on Machine Learning)

Full Text Available

« Prev Next »

Search for: All records